On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces
نویسندگان
چکیده
Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point of view of discovering low distortion embeddings into low-dimensional spaces. More specifically, we consider the Mahalanobis distance measure, the Bhattacharyya class of divergences and the KullbackLeibler divergence. We present a dimensionality reduction method based on the Johnson-Lindenstrauss Lemma for the Mahalanobis measure that achieves arbitrarily low distortion. By using the Johnson-Lindenstrauss Lemma again, we further demonstrate that the Bhattacharyya distance admits dimensionality reduction with arbitrarily low additive error. We also examine the question of embeddability into metric spaces for these distance measures due to the availability of efficient indexing schemes on metric spaces. We provide explicit constructions of point sets under the Bhattacharyya and the Kullback-Leibler divergences whose embeddings into any metric space incur arbitrarily large distortions. We show that the lower bound presented for Bhattacharyya distance is nearly tight by providing an embedding that approaches the lower bound for relatively small dimensional datasets.
منابع مشابه
On low dimensional local embeddings
We study the problem of embedding metric spaces into low dimensional Lp spaces while faithfully preserving distances from each point to its k nearest neighbors. We show that any metric space can be embedded into L p log2 k) p with k-local distortion of O((log k)/p). We also show that any ultrametric can be embedded into L k)/ 3 p with k-local distortion 1 + . Our embedding results have immediat...
متن کاملComputational metric embeddings
We study the problem of computing a low-distortion embedding between two metric spaces. More precisely given an input metric space M we are interested in computing in polynomial time an embedding into a host space M ′ with minimum multiplicative distortion. This problem arises naturally in many applications, including geometric optimization, visualization, multi-dimensional scaling, network spa...
متن کاملSpanners with Slack
Given a metric (V, d), a spanner is a sparse graph whose shortest-path metric approximates the distance d to within a small multiplicative distortion. In this paper, we study the problem of spanners with slack : e.g., can we find sparse spanners where we are allowed to incur an arbitrarily large distortion on a small constant fraction of the distances, but are then required to incur only a cons...
متن کاملLow dimensional embeddings of ultrametrics
In this note we show that every n-point ultrametric embeds with constant distortion in l O(logn) p for every ∞ ≥ p ≥ 1. More precisely, we consider a special type of ultrametric with hierarchical structure called a k-hierarchically well-separated tree (k-HST). We show that any k-HST can be embedded with distortion at most 1 + O(1/k) in l O(k2 logn) p . These facts have implications to embedding...
متن کاملRandom Feature Maps for Dot Product Kernels
Approximating non-linear kernels using feature maps has gained a lot of interest in recent years due to applications in reducing training and testing times of SVM classifiers and other kernel based learning algorithms. We extend this line of work and present low distortion embeddings for dot product kernels into linear Euclidean spaces. We base our results on a classical result in harmonic anal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009